Rank | Count | Beginning |
---|---|---|
30277 | 9251 | Il |
47673 | 7642 | La |
66459 | 4280 | Nel |
40098 | 3371 | In |
29039 | 2881 | I |
56995 | 2360 | Le |
19472 | 2055 | E |
75860 | 1445 | Per |
199 | 1434 | A |
87361 | 1389 | Si |
17079 | 1333 | Dopo |
26515 | 960 | Gli |
82139 | 852 | Questo |
69979 | 847 | Nella |
95947 | 821 | Un |
18741 | 718 | Durante |
95948 | 702 | Una |
80090 | 645 | Quando |
72658 | 637 | Non |
4585 | 632 | Anche |
61156 | 617 | Lo |
11602 | 608 | Con |
1492 | 607 | Al |
28013 | 587 | Ha |
80986 | 580 | Questa |
62845 | 577 | Ma |
2271 | 555 | Alla |
13702 | 549 | Dal |
25124 | 533 | Fu |
13496 | 520 | Da |
In the next four subsections show the most frequent sentence beginnings consisting of N words, N=1, 2, 3, 4. In this subsection we start with N=1.
The most frequent word-N-grams at the beginning of sentences give some insight into sentence composition.
Especially for N=1, we only need a small corpus to identify the most frequent sentence beginnings.
select substring_index(sentence, ' ', 1) as beg, count(*) as cnt from sentences group by substring_index(sentence, ' ', 1) order by cnt desc limit 50;
4.3.1.2 Most Frequent Sentence Beginnings II
4.3.1.3 Most Frequent Sentence Beginnings III
4.3.1.4 Most Frequent Sentence Beginnings IV
4.3.1.1 Most Frequent Sentence Endings I
4.3.1.2 Most Frequent Sentence Endings II
4.3.1.3 Most Frequent Sentence Endings III
4.3.1.4 Most Frequent Sentence Endings IV